Zippy: A Framework for Computation and Visualization on a GPU Cluster
نویسندگان
چکیده
Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general-purpose computation and visualization applications. However, the programming model for high performance generalpurpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy framework, a general and scalable solution to this problem. It abstracts the GPU cluster programming with a two-level parallelism hierarchy and a non-uniform memory access (NUMA) model. Zippy preserves the advantages of both message passing and shared-memory models. It employs global arrays (GA) to simplify the communication, synchronization, and collaboration among multiple GPUs. Moreover, it exposes data locality to the programmer for optimal performance and scalability. We present three example applications developed with Zippy: sort-last volume rendering, Marching Cubes isosurface extraction and rendering, and lattice Boltzmann flow simulation with online visualization. They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster.
منابع مشابه
Implementing the lattice Boltzmann model on commodity graphics hardware
Modern graphics processing units (GPUs) can perform generalpurpose computations in addition to the native specialized graphics operations. Due to the highly parallel nature of graphics processing, the GPU has evolved into a many-core coprocessor that supports high data parallelism. Its performance has been growing at a rate of squared Moore’s law, and its peak floating point performance exceeds...
متن کاملSupplemental Material: MPI Derived Datatypes Processing on Noncontiguous GPU-resident Data
A number of efforts have been undertaken to integrate GPU functionality into an HPC environment, with modifications at the application, programming model, and library levels to account for a discrete GPU main memory space. Work related to MVAPICH [1], [2] is discussed in Section 2.3 of the Main Material. At the application level, algorithms that use both MPI and GPUs, such as Jacobsen et al.’s ...
متن کاملFast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal
Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...
متن کاملAccelerating 3D Visualization in Reservoir Modeling System with Programmable Hardware
This study presents a new method on 3D visualization in reservoir modeling system by using the computation power of modern programmable Graphics hardware (GPU). The proposed scheme is devised to achieve parallel processing of massive reservoir logging data. By taking advantage of the GPU's parallel processing capability, moreover, the performance of our scheme is discussed in comparison with th...
متن کاملTera-scale Astronomical Data Analysis and Visualization
We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7–10 frames per second; (2) computation of basic global image statistics such as the mean intens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Graph. Forum
دوره 27 شماره
صفحات -
تاریخ انتشار 2008